An introduction for the R novice
Welcome!
Richard Layton
Rose-Hulman Institute of Technology
Fall 2016
Trying to practice what I preach, course materials are reproducible.
Introductions
Handouts
Write down your ideas in response to Mystery question 1:
What is reproducible research?
Research is reproducible when the data and the code used to obtain a finding are available and sufficient for an independent researcher to recreate the finding.
computational, data-intensive
spans the full data, analysis, & publication workflow
most of us have received only perfunctory training (if any)
Christopher Gandrud, Reproducible Research with R and RStudio, 2/e, CRC Press, 2015.
More accountability is needed because of
the primary findings were false. The major effect disappeared after correcting for
coding errors
selective exclusion of available data
unconventional weighting of summary statistics
data were falsified to obtain the research outcomes he wanted, resulting in
retracted journal articles (11 to date)
terminated clinical trials
cancelled research funding
civil suit by patients
Ivan Oransky, It’s official: Anil Potti faked cancer research data, say Feds, Retraction Watch, 2015-11-07.
Scientists and skeptics are in a knife fight, and you don’t bring data to a knife fight.
— Paul Erlich
Why should I make the data available to you, when your aim is to try and find something wrong with it?
— Phil Jones
Brad Keyes, Mann retirement: Analysis, reax, Climate Sceptic, 2016-05-08.
Jeff Leek, De-weaponizing reproducibility, 2015-03-13.
If you do anything “by hand”" once, you’ll do it 100 times.
— Paul Wilson, UW–Madison
Your closest collaborator is you, six months ago. Have you tried to email that slacker?
— Karl Broman, UW–Madison
To preserve sanity, stop collaborating via email, attachments, and tracking changes in Word.
— Jenny Bryan, UBC
Write scripts (avoid manual copy, paste, mouse-clicks)
Plan the organization and naming scheme for files
Strive for simplicity, readability, reusability, and testability
Agree on a workflow for collaborating before starting a manuscript
DRY (don’t repeat yourself)
Link files explicitly
Plan data management
Use version control
Postpone optimization
License your software
Jenny Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, Tracy Teal, and Greg Wilson, Good enough practices for scientific computing, 2016-01.
Write scripts (avoid manual copy, paste, mouse-clicks)
Plan the organization and naming scheme for files
Strive for simplicity, readability, reusability, and testability
Agree on a workflow for collaborating before starting a manuscript
DRY (don’t repeat yourself)
Link files explicitly
Plan data management
Use version control
Postpone optimization
License your software
See the syllabus.
Start your week 0 assignments.
Imagine that you were the author of the “Load cell calibration report”
Carefully review the report and answer Mystery question 2:
Identify as many “manual operations”
as possible.